在3D动作识别中,存在骨骼模式之间的丰富互补信息。然而,如何建模和利用这些信息仍然是一个充满挑战的3D动作表示学习的问题。在这项工作中,我们将交叉模式相互作用作为双向知识蒸馏问题。不同于经典的蒸馏解决方案,这些解决方案将固定和预训练的教师的知识转移到学生中,在这项工作中,知识在模式之间不断更新和双向蒸馏。为此,我们提出了一个新的跨模式相互蒸馏(CMD)框架,并采用以下设计。一方面,引入了相邻的相似性分布来对每种模式中学习的知识进行建模,其中关系信息自然适合对比框架。另一方面,不对称的配置用于教师和学生来稳定蒸馏过程并在模式之间传递高信心信息。通过派生,我们发现以前作品中的跨模式阳性采矿可以被视为我们CMD的退化版本。我们对NTU RGB+D 60,NTU RGB+D 120和PKU-MMD II数据集执行了广泛的实验。我们的方法的表现优于现有的自我监督方法,并设置了一系列新记录。该代码可在以下网址找到:https://github.com/maoyunyao/cmd
translated by 谷歌翻译
As one of the most important psychic stress reactions, micro-expressions (MEs), are spontaneous and transient facial expressions that can reveal the genuine emotions of human beings. Thus, recognizing MEs (MER) automatically is becoming increasingly crucial in the field of affective computing, and provides essential technical support in lie detection, psychological analysis and other areas. However, the lack of abundant ME data seriously restricts the development of cutting-edge data-driven MER models. Despite the recent efforts of several spontaneous ME datasets to alleviate this problem, it is still a tiny amount of work. To solve the problem of ME data hunger, we construct a dynamic spontaneous ME dataset with the largest current ME data scale, called DFME (Dynamic Facial Micro-expressions), which includes 7,526 well-labeled ME videos induced by 671 participants and annotated by more than 20 annotators throughout three years. Afterwards, we adopt four classical spatiotemporal feature learning models on DFME to perform MER experiments to objectively verify the validity of DFME dataset. In addition, we explore different solutions to the class imbalance and key-frame sequence sampling problems in dynamic MER respectively on DFME, so as to provide a valuable reference for future research. The comprehensive experimental results show that our DFME dataset can facilitate the research of automatic MER, and provide a new benchmark for MER. DFME will be published via https://mea-lab-421.github.io.
translated by 谷歌翻译
Cashews are grown by over 3 million smallholders in more than 40 countries worldwide as a principal source of income. As the third largest cashew producer in Africa, Benin has nearly 200,000 smallholder cashew growers contributing 15% of the country's national export earnings. However, a lack of information on where and how cashew trees grow across the country hinders decision-making that could support increased cashew production and poverty alleviation. By leveraging 2.4-m Planet Basemaps and 0.5-m aerial imagery, newly developed deep learning algorithms, and large-scale ground truth datasets, we successfully produced the first national map of cashew in Benin and characterized the expansion of cashew plantations between 2015 and 2021. In particular, we developed a SpatioTemporal Classification with Attention (STCA) model to map the distribution of cashew plantations, which can fully capture texture information from discriminative time steps during a growing season. We further developed a Clustering Augmented Self-supervised Temporal Classification (CASTC) model to distinguish high-density versus low-density cashew plantations by automatic feature extraction and optimized clustering. Results show that the STCA model has an overall accuracy of 80% and the CASTC model achieved an overall accuracy of 77.9%. We found that the cashew area in Benin has doubled from 2015 to 2021 with 60% of new plantation development coming from cropland or fallow land, while encroachment of cashew plantations into protected areas has increased by 70%. Only half of cashew plantations were high-density in 2021, suggesting high potential for intensification. Our study illustrates the power of combining high-resolution remote sensing imagery and state-of-the-art deep learning algorithms to better understand tree crops in the heterogeneous smallholder landscape.
translated by 谷歌翻译
Designing better deep networks and better reinforcement learning (RL) algorithms are both important for deep RL. This work focuses on the former. Previous methods build the network with several modules like CNN, LSTM and Attention. Recent methods combine the Transformer with these modules for better performance. However, it requires tedious optimization skills to train a network composed of mixed modules, making these methods inconvenient to be used in practice. In this paper, we propose to design \emph{pure Transformer-based networks} for deep RL, aiming at providing off-the-shelf backbones for both the online and offline settings. Specifically, the Transformer in Transformer (TIT) backbone is proposed, which cascades two Transformers in a very natural way: the inner one is used to process a single observation, while the outer one is responsible for processing the observation history; combining both is expected to extract spatial-temporal representations for good decision-making. Experiments show that TIT can achieve satisfactory performance in different settings, consistently.
translated by 谷歌翻译
Abstractive dialogue summarization has long been viewed as an important standalone task in natural language processing, but no previous work has explored the possibility of whether abstractive dialogue summarization can also be used as a means to boost an NLP system's performance on other important dialogue comprehension tasks. In this paper, we propose a novel type of dialogue summarization task - STRUctured DiaLoguE Summarization - that can help pre-trained language models to better understand dialogues and improve their performance on important dialogue comprehension tasks. We further collect human annotations of STRUDEL summaries over 400 dialogues and introduce a new STRUDEL dialogue comprehension modeling framework that integrates STRUDEL into a graph-neural-network-based dialogue reasoning module over transformer encoder language models to improve their dialogue comprehension abilities. In our empirical experiments on two important downstream dialogue comprehension tasks - dialogue question answering and dialogue response prediction - we show that our STRUDEL dialogue comprehension model can significantly improve the dialogue comprehension performance of transformer encoder language models.
translated by 谷歌翻译
Pre-trained language models have achieved promising success in code retrieval tasks, where a natural language documentation query is given to find the most relevant existing code snippet. However, existing models focus only on optimizing the documentation code pairs by embedding them into latent space, without the association of external knowledge. In this paper, we propose a generation-augmented query expansion framework. Inspired by the human retrieval process - sketching an answer before searching, in this work, we utilize the powerful code generation model to benefit the code retrieval task. Specifically, we demonstrate that rather than merely retrieving the target code snippet according to the documentation query, it would be helpful to augment the documentation query with its generation counterpart - generated code snippets from the code generation model. To the best of our knowledge, this is the first attempt that leverages the code generation model to enhance the code retrieval task. We achieve new state-of-the-art results on the CodeSearchNet benchmark and surpass the baselines significantly.
translated by 谷歌翻译
Multivariate time series forecasting constitutes important functionality in cyber-physical systems, whose prediction accuracy can be improved significantly by capturing temporal and multivariate correlations among multiple time series. State-of-the-art deep learning methods fail to construct models for full time series because model complexity grows exponentially with time series length. Rather, these methods construct local temporal and multivariate correlations within subsequences, but fail to capture correlations among subsequences, which significantly affect their forecasting accuracy. To capture the temporal and multivariate correlations among subsequences, we design a pattern discovery model, that constructs correlations via diverse pattern functions. While the traditional pattern discovery method uses shared and fixed pattern functions that ignore the diversity across time series. We propose a novel pattern discovery method that can automatically capture diverse and complex time series patterns. We also propose a learnable correlation matrix, that enables the model to capture distinct correlations among multiple time series. Extensive experiments show that our model achieves state-of-the-art prediction accuracy.
translated by 谷歌翻译
We introduce \textsc{PoliteRewrite} -- a dataset for polite language rewrite which is a novel sentence rewrite task. Compared with previous text style transfer tasks that can be mostly addressed by slight token- or phrase-level edits, polite language rewrite requires deep understanding and extensive sentence-level edits over an offensive and impolite sentence to deliver the same message euphemistically and politely, which is more challenging -- not only for NLP models but also for human annotators to rewrite with effort. To alleviate the human effort for efficient annotation, we first propose a novel annotation paradigm by a collaboration of human annotators and GPT-3.5 to annotate \textsc{PoliteRewrite}. The released dataset has 10K polite sentence rewrites annotated collaboratively by GPT-3.5 and human, which can be used as gold standard for training, validation and test; and 100K high-quality polite sentence rewrites by GPT-3.5 without human review. We wish this work (The dataset (10K+100K) will be released soon) could contribute to the research on more challenging sentence rewrite, and provoke more thought in future on resource annotation paradigm with the help of the large-scaled pretrained models.
translated by 谷歌翻译
Text style transfer aims to alter the style of a sentence while preserving its content. Due to the lack of parallel corpora, most recent work focuses on unsupervised methods and often uses cycle construction to train models. Since cycle construction helps to improve the style transfer ability of the model by rebuilding transferred sentences back to original-style sentences, it brings about a content loss in unsupervised text style transfer tasks. In this paper, we propose a novel disentanglement-based style transfer model StyleFlow to enhance content preservation. Instead of the typical encoder-decoder scheme, StyleFlow can not only conduct the forward process to obtain the output, but also infer to the input through the output. We design an attention-aware coupling layers to disentangle the content representations and the style representations of a sentence. Besides, we propose a data augmentation method based on Normalizing Flow to improve the robustness of the model. Experiment results demonstrate that our model preserves content effectively and achieves the state-of-the-art performance on the most metrics.
translated by 谷歌翻译
Open-retrieval conversational machine reading comprehension (OCMRC) simulates real-life conversational interaction scenes. Machines are required to make a decision of "Yes/No/Inquire" or generate a follow-up question when the decision is "Inquire" based on retrieved rule texts, user scenario, user question, and dialogue history. Recent studies explored the methods to reduce the information gap between decision-making and question generation and thus improve the performance of generation. However, the information gap still exists because these pipeline structures are still limited in decision-making, span extraction, and question rephrasing three stages. Decision-making and generation are reasoning separately, and the entailment reasoning utilized in decision-making is hard to share through all stages. To tackle the above problem, we proposed a novel one-stage end-to-end framework, called Entailment Fused-T5 (EFT), to bridge the information gap between decision-making and generation in a global understanding manner. The extensive experimental results demonstrate that our proposed framework achieves new state-of-the-art performance on the OR-ShARC benchmark.
translated by 谷歌翻译